Skip to content

[R3]: Move to new vLLM routed experts format#2487

Merged
S1ro1 merged 38 commits into
mainfrom
feat/r3-v3-routed-experts
May 22, 2026
Merged

[R3]: Move to new vLLM routed experts format#2487
S1ro1 merged 38 commits into
mainfrom
feat/r3-v3-routed-experts

Conversation

@S1ro1
Copy link
Copy Markdown
Collaborator

@S1ro1 S1ro1 commented May 13, 2026

PR is ready - routed-experts vLLM pin rebuilt from v0.21.0 + revert #42434 + PR39568

This branch moves prime-rl to the routed-experts response path used by current renderers/verifiers main, without the verifier-side renderer bypass. The x86_64 vLLM pin now points at a PrimeIntellect v0.5.0 release asset built from upstream vLLM releases/v0.21.0 plus upstream revert vllm-project/vllm#42434 and upstream PR vllm-project/vllm#39568.

The previous PR39568-only wheel still contained the PR39917 validation that rejected enable_return_routed_experts with async scheduling during inference startup. This new wheel removes that validation via revert #42434 while keeping the PR39568 routed-experts transfer path.

  • keep routed-experts data opaque through verifiers during token truncation; prime-rl decodes at the orchestrator boundary
  • preserve this branch's routed-experts source of truth: RoutedExperts(data, shape, dtype), explicit dtype maps, and _pack_routed_experts / _unpack_routed_experts for multi-turn stitching
  • update trainer packing/loading to slice, append, pad, and reconstruct the RoutedExperts transport struct with torch.frombuffer
  • pin vllm-router to 0.1.25 for the matching raw-uint8 schema and add pybase64
  • pin x86_64 vLLM to vllm-0.21.0+cu129.r42434.pr39568.a106aa6-cp38-abi3-manylinux_2_24_x86_64.whl
  • update deps/renderers to main 3ae276c (renderers-v0.1.8.dev25), including routed-experts sidecar parsing and fastokens>=0.2.0
  • update deps/verifiers to main 521d436c (v0.1.15.dev8-1-g521d436c), including the routed-experts response sidecar support and verifier helper cleanup
  • add the fastokens exclude-newer exemption and lock fastokens==0.2.0 to satisfy current renderers main
  • disable vLLM async scheduling only for the NIXL routed-experts capture path, where async scheduling leaves placeholder sampled-token state during capture
  • fix rendered multi-node orchestrator args to use the student client config keys

Related PRs

Verification

  • uv sync --all-extras --locked
  • uv lock --locked
  • uv run ruff check .
  • uv run ruff format --check --config=pyproject.toml
  • PYTEST_OUTPUT_DIR=/tmp/outputs uv run pytest tests/unit -m "not gpu" - 454 passed, 65 deselected, 35 warnings
  • Wheel smoke import: vllm.__version__ == "0.21.0+cu129.r42434.pr39568.a106aa6", ModelRunnerOutput.routed_experts present, _validate_return_routed_experts absent, and the old async-scheduling validation string absent.
  • Uploaded the new r42434.pr39568.a106aa6 vLLM wheel to the prime-rl v0.5.0 release and removed the stale PR39568-only wheel asset that failed inference startup.

Note

Medium Risk
Medium risk because this changes the routed_experts wire/transport format end-to-end (inference response → orchestrator → trainer) and adds a vLLM monkey-patch for NIXL disaggregated inference, which could break router-replay or inference compatibility if assumptions drift.

Overview
Updates prime-rl to the new vLLM routed-experts schema by exporting routing decisions as a compact base64-encoded uint8 byte payload (plus shape) from /inference/v1/generate, and decoding/packing it at the orchestrator boundary instead of propagating nested Python lists.

Introduces a RoutedExperts msgpack transport struct and refactors trainer batching/packing to slice/append/pad routed-experts using raw bytes, reconstructing tensors via torch.frombuffer. Adds config validation to reject router replay when inference.kv_cache_offload is enabled, and includes a vLLM __post_init__ monkey-patch to allow routed-experts capture with the NixlConnector in P/D disaggregated inference.

Bumps dependency pins to match the new protocol (vllm-router 0.1.25, x86_64 vLLM wheel), adds pybase64, and updates/extends unit tests to cover the new serialization and transport behavior.

Reviewed by Cursor Bugbot for commit c3ffa15. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bf79561 to 721a874 Compare May 13, 2026 12:13
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bc91c30 to e55328f Compare May 14, 2026 14:09
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from e55328f to 1fea38e Compare May 14, 2026 14:13
@S1ro1 S1ro1 marked this pull request as ready for review May 14, 2026 15:52
Comment thread src/prime_rl/orchestrator/trajectories.py
Comment thread src/prime_rl/trainer/rl/data.py
samsja
samsja previously approved these changes May 15, 2026
Comment thread pyproject.toml Outdated
S1ro1 and others added 6 commits May 16, 2026 03:08
* Guard checkpoint disk metrics mkdir

* Remove test_trainer_utils.py per review feedback

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Simplify ckpt disk metrics guard

Drop the rank-0 gate and the disk_usage path fallback per review feedback.
Catching FileExistsError on mkdir is sufficient: every rank that races on
mkdir either wins or harmlessly catches the BeegFS race, and shutil.disk_usage
can then operate on the now-existing ckpt_dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch 2 times, most recently from 64d3f2c to cb3c559 Compare May 19, 2026 15:46
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from cb3c559 to 4402d7e Compare May 19, 2026 15:53
S1ro1 added 2 commits May 21, 2026 20:45
…erts

# Conflicts:
#	pyproject.toml
#	src/prime_rl/inference/patches.py
#	src/prime_rl/inference/vllm/serving_chat_with_tokens.py
#	src/prime_rl/inference/vllm/serving_tokens.py
#	src/prime_rl/orchestrator/trajectories.py
#	src/prime_rl/trainer/batch.py
#	src/prime_rl/trainer/rl/data.py
#	tests/unit/orchestrator/test_batch.py
#	uv.lock
Comment thread src/prime_rl/trainer/batch.py
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from 11fe0ad to de71036 Compare May 21, 2026 21:31
Comment thread src/prime_rl/orchestrator/trajectories.py Outdated
@S1ro1 S1ro1 changed the title feat: wire r3 v3 routed experts replay [R3]: Move to new vLLM routed experts format May 22, 2026
Comment thread src/prime_rl/trainer/batch.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fbf16de. Configure here.

Comment thread src/prime_rl/orchestrator/trajectories.py
mikasenghaas
mikasenghaas previously approved these changes May 22, 2026
@S1ro1 S1ro1 merged commit 1e0fe96 into main May 22, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants